Bounded rational decision-making models suggest capacity-limited concurrent motor planning in human posterior parietal and frontal cortex

While traditional theories of sensorimotor processing have often assumed a serial decision-making pipeline, more recent approaches have suggested that multiple actions may be planned concurrently and vie for execution. Evidence for the latter almost exclusively stems from electrophysiological studies in posterior parietal and premotor cortex of monkeys. Here we study concurrent prospective motor planning in humans by recording functional magnetic resonance imaging (fMRI) during a delayed response task engaging movement sequences towards multiple potential targets. We find that also in human posterior parietal and premotor cortex delay activity modulates both with sequence complexity and the number of potential targets. We tested the hypothesis that this modulation is best explained by concurrent prospective planning as opposed to the mere maintenance of potential targets in memory. We devise a bounded rationality model with information constraints that optimally assigns information resources for planning and memory for this task and determine predicted information profiles according to the two hypotheses. When regressing delay activity on these model predictions, we find that the concurrent prospective planning strategy provides a significantly better explanation of the fMRI-signal modulations. Moreover, we find that concurrent prospective planning is more costly and thus limited for most subjects, as expressed by the best fitting information capacities. We conclude that bounded rational decision-making models allow relating both behavior and neural representations to utilitarian task descriptions based on bounded optimal information-processing assumptions.

: (A) Trials of the motor planning conditions in the delayed response task (DRT) and control condition (CT) were randomized and had a similar timeline. Each trial started with a fixation period (FIX) of random duration, a cue phase (CUE), where a cue-stimulus s 1 was presented, followed by a random dot pattern to mask afterimages of the CUE (MASK), a delay phase (DEL) of random duration for planning, and a response phase (RES). In the response phase (RES) the ultimate target location was revealed by a go-signal s 2 , namely a single grey frame around the relevant (part of the) respective response panel. Response trajectories were generated by subjects' button presses on the illustrated input device. In the control task, no potential target cues were presented during the cue phase and therefore no goal-directed actions could be planned during the delay phase. In the response phase of the CT subjects were finally informed about the target location (single grey box). (B) The four target planning conditions differed in the number of potential target locations initially cued. Subjects could see possible target positions highlighted (grey boxes). Potential targets to consider for planning were indicated by a grey frame around the field areas encompassing the relevant possible targets. Subjects were instructed to plan a single movement sequence towards one target ('1'), two partially overlapping sequences towards two potential targets ('11'), two distinct movement sequences towards two potential targets in different panels ('2'), or four distinct movement sequences towards four potential targets ('4') respectively and to execute the planned goal-directed movement as sequential button presses with the right thumb in the following phase, after the ultimate target was revealed by the go-signal. During the DEL phase, preparation for a goal-directed action could follow a serial memory-based planning strategy (null hypothesis H 0 : "delayed planning"), or a parallel prospective planning strategy (hypothesis H 1 : "concurrent prospective planning") where possible planning of actions as movement paths to each of the potential target locations could be anticipated. 3 Figure 2: Behavioral performance. Results are reported as the mean over subjects' individual task performance and normalized standard error (according to [31] to eliminate the between-subject variance, which does not contribute to the within-subject effect of the study). Results of the one-sided pairwise post-hoc comparisons of the respective marginal means are indicated with *** for p <= 0.001, ** for 0.001 < p <= 0.01, and ns for non-significant results (p > 0.05). (A) Reaction times (RT) and (B) movement times (MT) were calculated as averages across individual subjects' means ± SEM. Reaction times were significantly decreased for easy planning conditions ('1', '11' and '2') compared to the control condition CT, which indicates a benefit due to planning ('1' vs. CT, p < 0.0001; '11' vs. CT, p = 0.0002; '2' vs. CT, p = 0.0167). RT were higher in the more complex planning condition ('4') and did not significantly differ compared to CT ('4' vs. CT, p = 0.933). Differences for varying sequence length were found for MT but not for RT. (C) Error rates averaged across subjects indicated varying task difficulty over conditions revealed by an increase of target misses for conditions with higher task uncertainty (p = 0.024 for 11 vs CT; p = 0.005 for 2 vs CT; p < 0.001 for 4 vs CT) and between 2-and 3-step conditions and 2-and 4-step conditions (p < 0.0001 for 2 vs 3; p < 0.0001 for 2 vs 4). (D) During the delay phase there was no significant difference between the average number of eye saccades and therefore saccades cannot explain benefit in reaction times or fMRI activity in planning-related areas.
('1' vs. CT, p < 0.0125; '11' vs. CT, p < 0.0167; '2' vs. CT, p < 0.025), but not condition We report across-subjects averages and within-subjects variance as the normalized standard error (according to [31]). Time course activity and GLM estimates of further ROIscontralateral cortical ROIs (left) and ipsilateral cerebellar ROIs (right) -are provided in Supplementary Figure S1. Average MNI-coordinates (x, y, z in mm) are provided for each ROI. Statistical results are indicated with *** for p <= 0.001, ** for 0.001 < p <= 0.01, * for 0.01 < p < 0.05, and ns for non-significant results (p > 0.05). (B) In a 'non-planning ROI' (V1l) BOLD signal changes were not significantly different between conditions. (C) Second-level activation map shows significant delay-related fMRI activity across subjects in planning related areas (see Results for details) when contrasting DRT conditions vs. the control condition CT (p ≥ 0.05 family-wise error (FWE)-corrected for multiple comparisons).  Figure S1 with PMdl, antIPSl, DLPFCl, AICl, 199 cer6r, cer8r, SMA) seemed compatible with concurrent prospective planning. the remaining uncertainty about the actual hidden target w (see Figure 4). Accordingly, 205 the overall process can be described with two information quantities (I 1 and I 2 ), that are 206 dependent on the respective task condition (e.g. the number of potential targets and the 207 length of the action sequence that leads to these targets). In a first step, the information 208 I 1 is needed to form and maintain a memory m ∈ M of the initial cue, which is needed 209 to infer the world state when seeing s 2 after the delay. The process of memory formation 210 therefore partially resolves some uncertainty about w given the amount of information 211 that s 1 provides. This process and its associated information I 1 is identical in both 212 planning strategies, delayed planning (H 0 ) and concurrent prospective planning (H 1 ).

213
In the second processing step, uncertainty about actions gets reduced. This is quan-214 tified by additional information I 2 , that, however, differs for the delayed planning and Information about the hidden world state w ∈ W (the exact target location) is revealed in two steps during the experimental trial. The initial cue stimulus s 1 ∈ S 1 indicates potential target locations while the later go-signal s 2 resolves any remaining uncertainty about the actual hidden target w. (B) When seeing s 1 , the decision-maker can in a first step form a memory m ∈ M which is required to infer the world state when seeing s 2 after the delay and, therefore, is then able to appropriately plan and finally select an action a ∈ A that corresponds to a movement path representation. Both hypothesis (H 0 and H 1 ) contrast the predicted information I 1 and I 2 in all 12 DRT conditions for a delayed planning strategy and concurrent prospective planning strategy during the delay (DEL). H 0 : During the delay phase only memory m is used to reduce uncertainty about actions. H 1 : Concurrent prospective planning of actions during delay phase requires the anticipation of all possible selection stimuli s 2 and planning movement sequences in parallel. I 2 therefore is higher in H 1 than in H 0 and, moreover, varies more strongly with movement sequence length. (C) Model Comparison. We regressed theoretical memory and prospective planning information with measured fMRI activity in relevant brain areas during the delay phase and tested two hypotheses: H 0 , where information-processing merely requires uncertainty reduction based on memory formation m and H 1 with higher information-processing effort required for prospective planning to anticipate possible future movements. Theoretical information values I 1 and I 2 for different memory and planning capacities, defined by model parameters β 1 and β 2 , were regressed (using regression coefficients α 1 and α 2 ) with the measured fMRI BOLD activity. For lower β 1 and β 2 , memory and planning capacity accordingly is more limited. We regressed the information values for different degrees of boundedness and compared the best model fits with the maximal information values in the case without bounds, where β 1 and β 2 were chosen maximally (β 1 = 500, β 2 = 500). 9 concurrent prospective planning (hypothesis H 1 ) or serial delayed planning (hypothesis subjects to react more quickly to an upcoming scenario [32], it is immediately clear that 228 this type of parallel planning cannot be scaled up indefinitely and that there must be 229 bounds on the amount of information processing. For this reason we also assess these 230 capacity limits of information processing.

231
In order to formally compare the two planning hypotheses -concurrent prospective

247
The exact amounts of information (I 1 and I 2 ) depend on the capacity of the information 248 channels defined by model parameters β 1 and β 2 for the memory and the action process. the fMRI signal alone, it seems that subjects were limited in their information processing.

282
As a control, the R 2 -value of the information modulation in primary visual area V1l was  Results are reported as the mean over R 2 -values for subjects' individual best model fit for H 1 and normalized standard error (according to [31] to eliminate the between-subject variance, which does not contribute to the within-subject effect of the study). (B) Theoretical expected information costs I 1 (left) and I 2 (right) of prospective planning is based on model parameters determining memory and planning capacities fitted for all individual subjects (hypothesis H 1 "bounded"). Histograms represent the frequency distributions of the information costs, dependent on task conditions. Information values varied between bounded model predictions regressed to subjects fMRI data compared to the model predictions in the not-bounded case (in blue). Note that generally subjects lie below the information cost of the not-bounded decision-maker and only for the most simple 2-step conditions subjects' information costs for memory (I 1 ) deviate in the different direction from the predictions in the not-bounded case. This results from the specific task design and the fact, that bounds are provided on the total expected information. Decision-makers with not-bounded planning capacities can optimally assign resources to more difficult conditions and save memory costs for 2-step conditions, because no uncertainty about the target location will remain when s 2 is revealed given that all actions are planned prospectively under a high information resource for I 2 .
predicted information boundaries Tables 7 and 8) and is higher for subjects with higher capacity limits for I 1 and I 2 . performance. Thus, we may conclude that our optimal information model has significant 338 predictive power with respect to subjects' behavioral performance from fMRI activity 339 recorded during the planning phase in our delayed-response task.  Neural correlates of planning for sequential finger movements (yet without spatial tar-384 gets) have previously been studied for example in the context of "competitive queuing"   Similar to these previous studies where behavioral task performance and information 534 processing capacities were measured, we here use a normative probabilistic optimality 535 model for uncertainty reduction and parallel processing. In this study, however, we go a 536 step further and relate the predicted information flow with fMRI activity during motor 537 planning. In particular, measuring the activity of premotor and parietal planning areas 538 during the delayed response tasks allowed us to measure planning capacity for movement 539 preparation. As the bounded rationality framework allows for multiple information con-   Table 9.

728
For each subject and ROI, we extracted the GLM mean model parameters for the 729 regressors of interest from a 3 mm radius sphere around each individual ROI coordinate.

730
These beta parameters were (session-wise) normalized to the residual beta for any given 731 session (i.e. baseline) to provide an estimate of the %-signal change of the fMRI signal.

736
Theoretical Methods

737
Bounded Rationality Model We applied the information-theoretic bounded rationality framework [28, 103] to our experimental task, where information about the hidden world state w ∈ W (the exact target location) is revealed in two steps, first by the initial cue stimulus s 1 ∈ S 1 indicating all possible target locations, and after a delay by the gosignal s 2 ∈ S 2 that resolves the remaining uncertainty about the actual hidden target w. After seeing the cue stimulus, the agent can form a memory m ∈ M during the planning delay phase, before perceiving the cue stimulus s 2 and subsequently selecting an action a ∈ A that corresponds to a movement sequence leading to a target location. The sets W, S 1 , S 2 , M and A are finite and discrete, with |W| = 40 possible target locations (in distance of 2,3 or 4 steps), |A| = 80 possible movement sequences, |S 2 | = 8 possible half-sized frames for each panel and |S 1 | = |M| = 820 possible combinations of potential targets over all conditions. For any particular target condition c T and step condition c S , we then get a uniform distribution over stimuli where S c T ,c S is the subset of stimuli that belong to the condition (c T , s T ) and N c T ,c S =

746
In the model, the agent chooses the action a to maximize the task utility U (w, a) under 747 the constraint that only a certain amount of information-processing can be achieved when 748 forming the memory m and selecting the action a. For our task the utility function U is 749 a simple 0/1-utility: it is 1 whenever the action a is compatible with the hidden target w 750 and 0 otherwise.

751
Since information-processing can be unreliable, the agent's memory state and action policy are formalized by probability distributions p(m|s 1 ) and p(a|m, s 2 ), such that the amount of information processing can be captured by the Kullback-Leibler divergence between the prior distributions p(m) and p(a) and the posterior distributions p(m|s 1 ) and p(a|m, s 2 ), respectively. The bounded rational decision-making problem can then be written as a constrained optimization problem where the expectation is taken with respect to the distribution p(w, s 1 , s 2 , m, a) = p(w|s 1 , s 2 )p(s 1 )p(s 2 |s 1 )p(m|s 1 )p(a|m, s 2 ), with p(s 1 ), p(s 2 |s 1 ) and p(w|s 1 , s 2 ) defined by the task, and p(m|s 1 ) and p(a|m, s 2 ) left for optimization. Accordingly, the expected utility is given by a,s 1 ,s 2 ,w p(w|s 1 , s 2 )p(s 2 , s 1 )p(m|s 1 )p(a|m, s 2 )U (w, a) where U (w, a) = 1 if the action a leads to a target hit with target w and U (w, a) = 0 752 otherwise. Thus, we have E[U ] = 1 − ErrorRate.

753
The information quantities I(M, S 1 ) and I(A; M ) + I(A; S 2 |M ) respectively measure the average Kullback-Leibler divergence between the distributions p(m) and p(m|s1) for memory formation and the average Kullback-Leibler divergence between the distributions p(a) and p(a|m, s 2 ) for generating a specific action given memory m and stimulus s 2 . The parameters β 1 and β 2 reflect the degree of boundedness, where β 1,2 → ∞ reproduces a Bayes-optimal maximum expected utility decision-maker. The bounded optimal solution for equation 2 is given by  decision rule p(a|m, s 2 ) requires the agent to know the go-stimulus s 2 when deciding 758 about the action a, an information that is not available during the delay period. We 759 consider two hypotheses for information-processing during the delay phase.

760
• Hypothesis 0: Delayed Planning. Planning of the action a according to p(a|m, s 2 ) is delayed until the response phase, once s 2 is known. The delay phase is only used for uncertainty reduction, modeled by the mathematical transition from the prior p(m) to the posterior p(m|s 1 ) and in the space of actions from p(a) to p(a|m). In any particular condition (c T , c S ), we would then expect the information costs

764
• Hypothesis 1: Prospective Planning. Once the uncertainty over actions is reduced to p(a|m) by observing s 1 , all possible s 2 are anticipated in the delay phase, and for each s 2 an action is planned according to p(a|m, s 2 ). Depending on available information resources, the plans can be more or less precise. Once the actual gostimulus s 2 is revealed during the response phase, one of the planned actions can be immediately carried out. During the delay phase in any particular condition (c T , c S ), we would then expect the information costs

768
Assuming a linear relationship between informational surprise and brain signal fMRI(c T , c S ) for each condition (c T , c S ), we get a linear regression model fMRI(c T , c S ) = α 1 I 1 (c T , c S ) + α 2 I 2 (c T , c S ) + α 0 , with model parameters α i , i = 0, 1, 2.

769
For each of the two hypothesis H 0 and H 1 , we tested the multilinear regression between 770 the fMRI activity modulation and the two information modulations predicted by the 771 models, and with a nested model F-statistic (α = 5%), to find if I 2 significantly improves 772 the regression. Therefore, we applied the statistical test to the models with subject's 773 individual best fitting capacities and for individual fMRI activities in all relevant ROIs.